A 3D-Stacked Memory Manycore Stencil Accelerator System
نویسندگان
چکیده
Stencil operations are an important class of scientific computational kernels that are pervasive in scientific simulations as well as in image processing. A key characteristic of this class of computation is that they have a low operational intensity, i.e., the ratio of the number of memory accesses to the number of floating point operations it performs is high. As a result, the performance of stencil operations implemented on general purpose computing systems is bounded by the memory bandwidth. Technologies such as 3D stacked memory can provide substantially more bandwidth than conventional memory systems and can enhance the performance of memory intensive computations like stencil kernels. In this paper, we leverage this 3D stacked memory technology to design an accelerator for stencil computations. We show that for the best efficiency one needs to find the balance between computation and memory accesses to keep all components consistently busy. We achieve this by exploring how blocking and caching schemes to control the compute-to-memory ratio. Finally, we identify optimal design points that maximize performance.
منابع مشابه
3D-Stacked Memory-Side Acceleration: Accelerator and System Design
Specialized hardware acceleration is an effective technique to mitigate the dark silicon problems. A challenge in designing on-chip hardware accelerators for data-intensive applications is how to efficiently transfer data between the memory hierarchy and the accelerators. Although the Processingin-Memory (PIM) technique has the potential to reduce the overhead of data transfers, it is limited b...
متن کاملPRO3D, Programming for Future 3D Manycore Architectures: Project's Interim Status
PRO3D tackles two important 3D technologies, that are Through Silicon Via (TSV) and liquid cooling, and investigates their consequences on stacked architectures and entire software development. In particular, memory hierarchies are being revisited and the thermal impact of software on the 3D stack is explored. As a key result, a software design flow based on the rigorous assembly of software co...
متن کاملCache based optimization of stencil computations : an algorithmic approach
We are witnessing a fundamental paradigm shift in computer design. Memory has been and is becoming more hierarchical. Clock frequency is no longer crucial for performance. The on-chip core count is doubling rapidly. The quest for performance is growing. These facts have lead to complex computer systems which bestow high demands on scientific computing problems to achieve high performance. Stenc...
متن کاملResource Management Design in 3D-Stacked Multicore Systems for Improving Energy Efficiency
Technology scaling and increasing power densities have led to a transition from single-core to multi-core processors, and the trend is now moving towards many-core architectures. Hundreds of millions of transistors can now be integrated on a single chip, however, they cannot be fully exploited due to interconnect/memory latency, power consumption, and yield related challenges. 3D integration is...
متن کاملPicoServer Revisited: On the Profitability of Eliminating Intermediate Cache Levels
The confluence of 3D stacking, emerging dense memory technologies, and low-voltage throughput-oriented manycore processors has sparked interest in single-chip servers as building blocks for scalable data-centric system design. These chips encapsulate an entire memory hierarchy within a 3D-stacked multi-die package. Stacking alters key assumptions of conventional hierarchy design, drastically in...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015